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1.  Introduction 


In  today’s  world,  modern  criminal  networks  are  constantly  changing  to  maintain  secrecy,  recruit 
members,  and  coordinate  activities.  Previous  research  has  “focused  on  analyzing  static  networks 
that  do  not  change  over  time. . .  (when)  in  real  life  many  networks  are  inherently  dynamic”  (1 ). 

By  incorporating  dynamic  trends  one  can  discover  important  elements  of  the  network  and  disrupt 
harmful  plans  (2).  This  research  provides  a  potential  approach  where  analysts  can  compare 
different  centrality  values  and  predict  how  key  players  change  over  time  ( 3 ,  4). 

The  purpose  of  this  report  is  to  (1)  identify  and  visualize  how  a  network  changes  over  time, 

(2)  calculate  centrality  measures  of  a  dynamic  network,  (3)  evaluate  prediction  methods  to 
forecast  network  behavior,  and  (4)  examine  how  a  network  instantaneously  changes  when  a  node 
is  removed.  In  section  2, 1  examine  the  theory  behind  a  visualization  algorithm,  centrality 
measures,  and  prediction  methods.  In  section  3, 1  apply  this  theory  to  a  case  study  of  the  Ali 
Baba  data  set.  Lastly,  in  section  4, 1  identify  continued  efforts  with  this  project  and  future  work 
in  the  field. 

In  order  to  compute  the  measures,  the  open  source  R  programming  language  and  environment 
was  used.  Over  the  past  year  I  have  been  teaching  myself  R  and  I  wrote  more  than  400  lines  of 
code  for  this  project.  Sample  code  was  taken  from  Stanford  University  (5)  as  a  starting  point  for 
this  project.  This  work  was  supplemented  by  the  “igraph”  package,  which  implements 
algorithms  and  additional  call  functions  for  social  network  analysis  (SNA)  (<5).  Using  this  code, 

I  was  able  to  visualize  the  network  and  calculate  centrality  measures. 


2.  Network  Theory 


For  this  report,  Fruchterman-Reingold  was  chosen  for  its  strengths  in  visualizing  large 
undirected  networks  using  a  force  directed  algorithm.  The  advantages  are  its  flexibility,  simple 
structure,  and  interactive  nature.  Also,  the  graph  establishes  edges  that  are  equal  in  length  and 
reduces  the  number  of  intersecting  edges.  However,  the  algorithm  has  a  high  runtime  for 
extremely  large  systems.  Fruchterman-Reingold  assigns  forces  as  if  the  vertices  were 
electrically  charged  particles  and  the  edges  were  springs.  Equation  1  represents  the  energy  of  the 
physical  system,  which  is  repeated  until  equilibrium  is  achieved.  The  first  term  is  the  attraction 
between  connected  vertices.  The  second  is  the  repulsion  between  pairs  of  different  vertices 
(5,7). 

U(p)  =  X{u,V}e£^IIp(X)  -pO)ll  ~b  2{u,v}eV  2  In  ||p(w)  —  p(v)\\  (1) 
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In  a  network,  individual  people  are  represented  as  vertices  and  the  relations  between  them  as 
edges  (4,  8).  This  report  examines  the  undirected  connections  of  vectors,  or  in  simpler  terms,  the 
association  between  people  (9).  I  investigate  the  density  of  a  network  and  take  a  look  into  four 
key  centrality  measures  ( 8 ,  10)  used  in  network  analysis.  The  density  of  a  network  (equation  2) 
for  binary  data  is  defined  as  the  number  of  edges  that  exist  over  the  total  number  of  possible 
edges  for  the  network  (11-13).  Density  relates  to  the  speed  at  which  information  diffuses 
between  the  actors  (11).  Criminal  networks  typically  remain  decentralized  enough  to  remain 
secret,  but  dense  enough  to  enable  coordination  (12). 


V*(V- 1)  7 

The  “measure  of  activity”  of  a  network  is  also  known  as  the  degree  centrality  (equation  3)  (2). 
This  value  determines  how  many  people  have  direct  relationships  with  an  individual  (4,  5,  10). 
Degree  centrality  is  a  fair  approximation  of  the  influence,  prominence,  or  prestige  of  a  node.  For 
simplicity,  the  more  ties  a  node  has  (and  hence  higher  degree  centrality),  the  more  powerful  the 
person  is  (2, 11). 

CdW  =  n_1  (3) 


Closeness  centrality  (equation  4)  measures  the  dyad  or  “number  of  steps”  from  each  node  (u)  to 
all  other  nodes  (v)  in  the  network  (1,  4,  5, 10, 11).  Actors  that  are  close  to  others  are  considered 
more  important  to  the  network  (11,  12).  Thus,  one  can  conclude  that  those  with  a  high  closeness 
centrality  are  leaders  of  the  network  (2). 


Cc(u)  = 


d(u,  v ) 
71—1 


(4) 


Betweenness  centrality  (equation  5)  is  a  measure  of  the  number  of  shortest  (puv)  going  through  a 
specific  vertex  (w)  (1,  4,  5, 10,  12).  This  value  determines  the  gateway  between  different 
subgroups  and  explains  the  influence  over  the  flow  of  information  (2). 

f  c  \  _  2Y,u<v  Puviw)/P  liv  /r\ 

Cb^W)  ~  (n-l)(n-2)  (5) 

Lastly,  eigenvector  centrality  (equation  6)  represents  how  close  an  actor  is  to  other  actors  who 
are  important.  An  actor  can  acquire  high  eigenvector  centrality  by  being  connected  to  a  lot  of 
other  people  or  by  being  connected  to  others  who  are  highly  central  (2,  5, 10).  This  centrality  is 
identified  by  some  as  the  cohesiveness  of  the  group  or  the  connectedness  of  an  individual  node 
(4).  When  the  eigenvector  centrality  value  is  ranked  for  nodes  within  the  network,  it  is  called  a 
“node’s  network  importance.” 

xi  =  lll1j=1AijXj  (6) 

Now  that  I  have  explained  the  centrality  measures  used,  I  move  on  to  how  to  predict  them. 
Scientists  and  analysts  agree  that  even  though  they  are  seemingly  random,  human  contact 
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networks  are  predictable  (1).  If  one  can  forecast  network  behavior,  one  can  stay  a  step  ahead  of 
our  adversaries  (2).  An  analyst  can  use  this  information  to  determine  when  a  network  is  getting 
too  dangerous,  decide  whether  intervening  is  important,  and  pick  out  the  key  players.  Table  1 
represents  a  few  of  the  prediction  methods  that  are  investigated. 

Table  1.  Methods  for  predicting  centrality  measures  (7). 


Method 

Description 

Last 

The  last  node’s  centrality  value 

Uniform  Moving  Average 

The  average  of  ‘r’  past  centrality  values 

Weighted  Moving  Average 

The  most  recent  weighted  highest,  with  decreasing  weights 

Polynomial  Regression 

Model  of  degree  three  with  epsilon  term  less  than  0.2 

To  analyze  the  individual  methods,  the  error  is  computed  between  the  centrality  value  of  interest 
and  the  predicted  value  (equation  7).  It  is  important  to  remember  that  there  is  no  single  best 
prediction  method  for  all  centrality  measures  or  data  sets. 

I|C(u)-  C(u)| 

Error  =  - - -  (7) 

When  an  analyst  sees  a  network  that  is  becoming  increasingly  dangerous,  sometimes  network 
disruption  is  needed.  In  network  theory,  nodes  with  the  highest  betweenness  are  called  bridges 
and  those  with  the  highest  degree  are  called  hubs.  Scale-free  networks,  where  the  degree 
distribution  follows  a  power  law,  are  vulnerable  to  both  bridge  and  hub  removals  (13).  The  case 
study  examines  several  interesting  findings  regarding  prediction  and  node  removal. 


3.  Case  Study 


The  Ali  Baba  data  set  was  originally  developed  in  2003  by  the  National  Security  Agency  (NSA) 
to  test  visualization  software.  The  initial  data  set  contained  752  messages  that  followed  the 
actions  of  a  fictitious  terrorist  network  centralized  in  southeast  England.  The  members  of  the 
suspected  network  plan  to  bomb  a  water  treatment  facility  as  revenge  following  an  outbreak  of 
cholera  among  Egyptian  school  children  (14,  15).  Due  to  the  unclassified  nature  and  size  of  the 
data  set,  Ali  Baba  is  commonly  used  as  a  testbed  for  SNA  technology  (15). 

Figure  1  shows  the  Fruchterman-Reingold  visualization  for  the  Ali  Baba  network  from  May  to 
November.  It  is  important  to  notice  the  increasing  number  of  edges  and  links  as  well  as  the 
visibility  of  central  members. 
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Figure  1 .  Fruchterman-Reingold  network  evolution. 

First,  I  look  at  the  density  of  the  entire  Ali  Baba  network  from  May  to  November  (figure  2).  In 
June,  the  network  is  using  1 1%  of  all  possible  ties,  and  in  October,  only  6%.  This  tells  us  that 
the  members  of  the  network  are  having  as  little  interaction  as  possible,  which  is  certainly 
expected  from  this  sort  of  criminal  network. 
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Figure  2.  Density  of  the  Ali  Baba  network. 

Now  I  examine  the  centrality  measures  over  time  of  several  key  members  of  the  Ali  Baba 
network.  Looking  at  figure  3a,  there  is  a  large  spike  in  the  closeness  centrality  from  October  to 
November,  but  at  this  point  it  is  uncertain  what  that  means.  Additionally,  all  individual’s 
closeness  centrality  measures  follow  the  same  general  pattern.  By  just  using  one  centrality 
measure  I  was  unable  to  distinguish  between  the  key  players. 
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Next,  I  examine  degree  centrality  or  the  “measure  of  activity.”  Figure  3b  is  very  helpful  in 
distinguishing  between  players.  In  the  beginning,  Tarik  has  the  most  power,  but  this  switches 
between  June  and  July.  As  the  network  grows,  Imad  takes  over  as  the  most  powerful  individual. 
This  piece  of  information  is  important  when  it  comes  to  network  disruption  and  node  removal. 
Ali  Ops  and  Phil  are  maintaining  their  degree  measures  and  current  power  positions  during  the 
last  few  months. 


Betweenness  centrality  tells  us  about  the  flow  of  information.  Figure  3c  shows  that  Imad  and 
Tarik  have  the  greatest  influence  over  the  flow  of  information  in  November.  But  the  most 
interesting  piece  is  Phil,  who  has  zero  betweenness  centrality.  If  one  were  trying  to  disrupt  the 
chain  of  information,  one  would  not  want  Phil  to  be  involved  because  there  would  be  no  effect 
on  the  network. 


3a.  Closeness  Centrality 


3c.  Betweenness 
Centrality 


3b.  Degree  Centrality 


3d.  Rank  of  Eigenvector 
Centrality 
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Figure  3.  Centrality  measures  of  Ali  Baba  key  members. 

The  last  centrality  measure  of  interest  is  Eigenvector  centrality  or  a  “node’s  network 
importance.”  In  red,  Tarik  is  the  most  important  individual  from  May  to  July.  During  July  and 
August  there  is  a  big  shift  in  the  network  eigenvector  centralities.  Tarik  moves  from  number  one 
to  16th  and  in  his  place  are  Imad  and  Ali  Ops  with  number  one  and  two  spots,  respectively.  It  is 
interesting  to  note  that  in  September,  Ali  Baba  appears  as  the  number  three  person  in  the 
network.  Ali  Baba  has  very  few  links  in  the  network  so  it  is  exciting  to  see  him  stand  out  with 
this  metric. 
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In  the  previous  section,  the  motivation  and  techniques  used  for  prediction  of  dynamic  social 
networks  were  explained.  Figure  4  illustrates  an  example  of  these  prediction  methods  for  degree 
centrality.  Figure  4a  compares  the  four  techniques  and  the  original  centrality  values  for  Imad. 

By  computing  the  error,  I  determined  that  a  third  degree  polynomial  is  the  best  fit  and  uniform 
moving  average  is  the  worst.  Figure  4b  applies  the  polynomial  regression  to  all  key  players  to 
predict  for  the  month  of  December.  It  appears  that  Imad  and  Tank  will  continue  to  gain  power, 
while  Ali  Ops  will  significantly  decrease. 


4a.  All  Degree 

y  =  O.OOICx3-  0.021xJ  +  0.1053x  +0,0418 

Prediction  Models... 


4b.  Degree  Centrality 
with  December... 


a; 

u 

<d 

CtO 

CD 


♦  Original 

- 1 -  *M“Last 

0  5  Ji  Uniform 

Months  (May-Decernb;er^0|ynomia| 


0.50 


S  0.00 

c 

<D 

u 

<u 

<D 

k. 

M 

<U 


Imad 

■Tarik 


0  5  10  — A — Ali  Ops 

Months  (May  -  Decemb£^^,p^j| 


Figure  4.  Prediction  for  degree  centrality. 

The  last  thing  that  I  examine  is  the  effect  of  node  removal  on  the  Ali  Baba  network.  Throughout 
my  investigation  of  centrality  measures,  Imad  stands  out  as  the  most  important  individual.  As  a 
bridge  and  a  hub  in  this  scale-free  network,  I  remove  Imad  to  see  the  instantaneous  effect  on  the 
network  as  a  whole  and  its  centrality  measures.  Figure  5  shows  the  removal  of  Imad  in  maroon, 
which  also  leads  to  the  removal  of  direct  links  and  the  loss  of  members  whose  on  ties  were  with 
Imad. 


The  centrality  measure  results  of  a  node  removal  are  shown  in  figure  6.  When  Imad  is  removed, 
the  closeness  and  betweenness  centrality  drop  to  almost  nothing.  Also  notice  that  Ali  Ops  was 
one  of  the  players  temporarily  removed  since  his  only  tie  was  with  Imad.  The  degree  centrality 
of  Tarik  and  Phil  is  not  significantly  changed,  so  one  can  assume  that  they  maintain  most  of  their 
ties.  Since  Ali  Ops  is  removed,  he  has  zero  degree  centrality.  Lastly,  eigenvector  centrality 
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(figure  6d)  shows  that  after  Imad  is  removed,  Tank  steps  up  as  the  number  one  individual  in  the 
network.  Phil  also  becomes  more  important,  by  becoming  the  eighth  ranking  individual. 


6a.  Closeness  Centrality 
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6c.  Betweenness  Centrality 
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6b.  Degree  Centrality 
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6d.  Eigenvector 
Centrality 
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Figure  6.  Result  of  Imad  node  removal  on  centrality  measures. 


4.  Conclusions 


Throughout  this  summer  project,  I  continued  efforts  to  predict  seemingly  random  criminal 
network  behavior  by  viewing  them  as  dynamic  systems.  I  used  the  Fruchterman-Reingold 
algorithm  to  identify  structure,  distinguish  key  players,  and  understand  behavioral  roles.  I 
conducted  a  network  trend  analysis  by  looking  into  degree,  closeness,  betweenness,  and 
eigenvector  centrality.  I  evaluated  several  methods  of  centrality  prediction  including  polynomial 
regression  and  moving  average.  Lastly,  I  looked  into  the  immediate  result  of  removing  a  key 
node  from  a  scale  free  terrorist  network. 

Additional  prediction  research  can  be  done  using  time  series  analysis  and  quality  control  charts 
to  further  understand  centrality  measures  of  dynamic  networks.  Evaluation  of  other  metrics  such 
as  embeddedness,  reachability,  and  assortivity  (13)  can  also  be  included  in  this  analysis.  There  is 
a  need  for  in-depth  research  on  node  removal  and  how  networks  adapt  to  the  disruption.  In  the 
future,  I  plan  to  incorporate  these  additional  topics  with  a  secondary  case  study  of  the  more 
complex  Enron  data  set  (16). 
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